LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering

نویسندگان

چکیده

Video Question Answering (VideoQA), aiming to correctly answer a given question based on understanding multimodal video content, is challenging due the richness of content. From perspective understanding, complete VideoQA framework needs understand content at different semantic levels and flexibly integrate diverse distill question-related To this end, we propose Lightweight Visual-Linguistic Reasoning named $\text{LiVLR}$. Specifically, notation="LaTeX">$\text{LiVLR}$ first utilizes graph-based visual linguistic encoders obtain multi-grained representations, respectively. Subsequently, obtained representations are integrated with devised Diversity-aware module ($\text{DaVL}$). notation="LaTeX">$\text{DaVL}$ distinguishes types learnable index embedding in graph embedding. Therefore, can adjust importance when generating joint representation. The proposed lightweight shows its performance advantage three benchmarks, MRSVTT-QA, KnowIT VQA, TVQA. Extensive ablation studies demonstrate effectiveness key components

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image-Question-Linguistic Co-Attention for Visual Question Answering

Our project focuses on VQA: Visual Question Answering [1], specifically, answering multiple choice questions about a given image. We start by building MultiLayer Perceptron (MLP) model with question-grouped training and softmax loss. GloVe embedding and ResNet image features are used. We are able to achieve near state-of-the-art accuracy with this model. Then we add image-question coattention [...

متن کامل

Explicit Knowledge-based Reasoning for Visual Question Answering

We describe a method for visual question answering which is capable of reasoning about contents of an image on the basis of information extracted from a large-scale knowledge base. The method not only answers natural language questions using concepts not contained in the image, but can provide an explanation of the reasoning by which it developed its answer. The method is capable of answering f...

متن کامل

Linguistic Knowledge And Question Answering

The availability of robust and deep syntactic parsing can improve the performance of all modules of a Question Answering system. In this article, this is illustrated using examples from our QA system Joost, a Dutch QA system which has been used for both open and closed domain QA. The system can make use of information found in the fully parsed version of the document collections. We demonstrate...

متن کامل

QuestionCube: a Framework for Question Answering

QuestionCube is a framework for Question Answering (QA) that combines several techniques to retrieve passages containing the exact answers for natural language questions. It exploits: (a) Natural Language Processing algorithms for question and candidate answers analysis both in English and Italian; (b) Information Retrieval probabilistic models for candidate answers retrieval and (c) Machine Le...

متن کامل

Knowledge and Reasoning for Medical Question-Answering

Restricted domains such as medicine set a context where question-answering is more likely expected to be associated with knowledge and reasoning (Mollá and Vicedo, 2007; Ferret and Zweigenbaum, 2007). On the one hand, knowledge and reasoning may be more necessary than in open-domain question-answering because of more specific or more difficult questions. On the other hand, it may also be more m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Multimedia

سال: 2022

ISSN: ['1520-9210', '1941-0077']

DOI: https://doi.org/10.1109/tmm.2022.3185900